Weighted Sequencing from Compomers: DNA de-novo sequencing from mass spectrometry data in the presence of false negative peaks
نویسنده
چکیده
One of the main endeavors in today’s Life Science remains the efficient sequencing of long DNA molecules. Today, most de-novo sequencing of DNA is still performed using electrophoresis-based Sanger Sequencing introduced in 1977, in spite of certain restrictions of this method. Recently, we proposed a new method for DNA sequencing using base-specific cleavage and mass spectrometry, that appears to be a promising alternative to classical DNA sequencing approaches: Among its benefits is the extremely fast data acquisition of mass spectrometry. This leads to the combinatorial problem of Sequencing From Compomers (SFC), and to the definition of sequencing graphs. Simulations indicate that this method may allow for de-novo sequencing of DNA molecules with 200+ nt. An open problem in the context of SFC is that it does not take into account false negative peaks (missing peaks) that are common for real-world mass spectra. Here, we present a natural generalization of SFC, the Weighted Sequencing from Compomers (WSC) Problem, that allows us to cope with false negative peaks. We also show that the family of graphs introduced to solve SFC, can be generalized to capture the new aspects of WSC. Finally, we present a branch-and-bound algorithm to find all sequences that agree with the sample mass spectra with the exception of some missing peaks.
منابع مشابه
Sequencing from compomers in the presence of false negative peaks
One of the main endeavors in today’s Life Science remains the efficient sequencing of long DNA molecules. Today, most de-novo sequencing of DNA is still performed using electrophoresis-based Sanger Sequencing, based on the Sanger concept of 1977. Methods using mass spectrometry to acquire the Sanger Sequencing data are limited by short sequencing lengths of 15–25 nt. Recently, we proposed a new...
متن کاملSequencing from Compomers: Using Mass Spectrometry for DNA De-Novo Sequencing of 200+ nt
One of the main endeavors in today's life science remains the efficient sequencing of long DNA molecules. Today, most de novo sequencing of DNA is still performed using the electrophoresis-based Sanger concept of 1977, in spite of certain restrictions of this method. Methods using mass spectrometry to acquire the Sanger sequencing data are limited by short sequencing lengths of 15-25 nt. We pro...
متن کاملPEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry.
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing softwa...
متن کاملPEAKS: Powerful Software for Peptide De Novo Sequencing by MS/MS
A number of different approaches have been described to identify proteins from tandem mass spectrometry (MS/MS) data. The most common approaches rely on the available databases to match experimental MS/MS data. These methods suffer from several drawbacks and cannot be used for the identification of proteins from unknown genomes. In this communication, we describe a new de novo sequencing softwa...
متن کاملPepNovo: de novo peptide sequencing via probabilistic network modeling.
We present a novel scoring method for de novo interpretation of peptides from tandem mass spectrometry data. Our scoring method uses a probabilistic network whose structure reflects the chemical and physical rules that govern the peptide fragmentation. We use a likelihood ratio hypothesis test to determine whether the peaks observed in the mass spectrum are more likely to have been produced und...
متن کامل